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Fast and numerically stable circle fit 

H. Abdul-Rahman^ and N. Chernov^ 


Abstract 

We develop a new algorithm for fitting circles that does not have 
drawbacks commonly found in existing circle fits. Our fit achieves 
ultimate accuracy (to machine precision), avoids divergence, and is 
numerically stable even when htting circles get arbitrary large. Lastly, 
our algorithm takes less than 10 iterations to converge, on average. 
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1 Introduction 

Fitting circles and circular arcs to observed points is a basic task in pattern 
recognition and computer vision [DEiiiiiniiTiEiEicni HE]. Some authors 
assert that “most of the objects in the world are made up of circular arcs and 
straight lines”; see [min]. 

The classical least squares fit minimizes geometric distances from the 
observed points to the fitting circle: 

n 

F{a, b,R) = y^^\^/{xi - g)^ -k {yi - by - R] ^ -)■ min (1.1) 

i=l 

Here {xi,yi) denotes the observed points, (a, 6) the center and R the radius 
of the htting circle. 

The geometric ht fll.ip has many attractive features. It is invariant under 
translations, rotations, and scaling, i.e., the best htting arc does not depend 
on the choice of the coordinate system. It provides the maximum likelihood 
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estimate under standard statistical assumptions El Eli]- The minimization 
of geometric distances is often regarded as the most desirable solution^] of the 
htting problem, albeit hard to compute in some cases. 

Our paper is devoted to practical algorithms for minimization of fll.ip . 
Theoretical aspects of the circle htting problem are covered elsewhere: for 
the existence and uniqueness of the global minimum of the objective function 
F(a, 6, R) see El IH ESI E2], for its differentiability see Lemma 7 in [H], etc. 

Most authors solve (HU by general minimization algorithms, such as 
Gauss-Newton (GN) [HI [9] or Levenberg-Marquardt (LM) pE7] ; see a review 
[1] . Gircle-specihc schemes exist [El EH] but they converge linearly and often 
take hundreds of iterations BE], which makes them impractical. 

The GN and LM normally converge in 5-10 iterations, but they have a 
number of known issues. First, they occasionally diverge. It was shown [6] 
(see a detailed proof in B Section 3.8]) that there is a valley in the parame¬ 
ter space that extends to inhnity, along which the objective function slowly 
decreases but remains above its minimum value. Thus if the minimization 
algorithm starts in that valley (or gets there by chance) it will be forced to 
move away from the minimum of fll.ip . 

Numerical tests |6] show that if the initial guess is picked at random, the 
chance of divergence may be as high as 50%. If the initial guess is supplied by 
an algebraic circle £t (such as Kasa fit [H]), then the chances of divergence 
are very low, but it is still possible (see an example in [U Section 5.13]). 

One way to avoid divergence is to use the so-called algebraic parameters, 
in which the circle equation is y4(a;^ -|- y'^) + Bx + Cy -|- H = 0 with additional 
constraint = A AD -I- 1; see details in BE]. One can determine 

A, B, C, D for the best htting circle and then convert them to a, b, R. Such 
an algorithm was proposed in B, it indeed converges to a minimum 
of fll.ip from any starting point. But using algebraic parameters A, B,C,D 
leads to very complicated formulas, and the resulting algorithm is about 
10 times slower than the one using the geometric parameters a, b, R (see 
BE]). There is also a possible loss of accuracy at the stage of converting the 
algebraic parameters to the geometric ones. Our numerical tests (Section [7P 
demonstrate these drawbacks. 

The second issue is accuracy. Standard algorithms use an adaptive step 
procedure: if the value of the objective function fll.ip is not decreasing at the 

^In particular, it has been prescribed by a recently ratified standard for testing the 
data processing software for coordinate metrology [1] . 
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next iteration, the latter is rejected and the step is recomputed by adjusting 
the value of a control parameter. This ‘acceptance rule’ makes the accuracy 
of the parameter estimates no better than where e denotes the 

machine precision {e ^ 2-10“^® in the standard double precision arithmetic); 
we explain this below. In fact, most algorithms stop iterations whenever the 
step gets smaller than see mm- 

We modify this ‘acceptance rule’ so that the accuracy of the parameter 
estimates becomes 0{e), rather than This makes our algorithm 

numerically stable, in a formal sense [20]. This ultimate accuracy is achieved 
by using the norm of the gradient of the objective function, rather than the 
function itself, as the ‘acceptance criterion’; see below. 

One may wonder how many additional iterations it takes to reach this 
ultimate accuracy. The standard Gauss-Newton and Levenberg-Marquardt 
schemes are known to converge linearly in the vicinity of a minimum, so they 
might take 10-15 extra iterations. Instead, we employ a version of a ‘full 
Newton method’ [15] which guarantees quadratic convergence. So it takes 
just 1-2 extra iterations to reach the ultimate accuracy. 

The third issue is related to large circles. If the data points lie along 
a circular arc with low curvature, then the best htting circle has a large 
radius R and a far away center (a, b). This leads to catastrophic cancelation 
in the calculation of the function fll.ll) and its derivatives, so the resulting 
estimates get poor. As a remedy, one can use algebraic parameters [B] or other 
special parameters m, but again this leads to very complicated formulas and 
involves an inevitable loss of accuracy at the stage of conversion from one set 
of parameters to another. 

We resolve this issue while staying with the natural geometric parameters 
(a, b, R) at all times. For large circles we use different formulas for the objec¬ 
tive function fll.ip and its derivatives, which are mathematically equivalent 
to the standard formulas but are organized differently to avoid catastrophic 
cancelation; see Section (5] 

Our numerical tests show that the resulting algorithm has the following 
features: convergence to a local minimum of fll.ip from nearly any initial 
guess (in fact, it never failed to converge in our tests), the hnal accuracy 
0(s), the average number of iterations is only 7-8, and the average execution 
time is only 50% higher compared to standard GN and LM algorithms (which 
have issues as listed above). We tested all the other published algorithms and 
none of them came close to these characteristics; see Section [3 While the 
numerical tests may not ‘prove’ the superiority of our approach, we believe 
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that our theoretical resolution of the above difficult issues constitutes the 
real novelty of our work. 

The paper is organizes as follows. We begin with a standard modihca- 
tion and reduction of the objective function. In section [3] we propose a “two 
phase” acceptance rule to increase the accuracy from to 0{e). In 

section 0] we present our version of the full Newton method with Levenberg- 
Marquardt correction. In section 0] we derive formulas for the objective func¬ 
tion, its derivative and Hessian, that avoid catastrophic cancelation for large 
circles. In section [6] we describe a method that prevents divergence. In 
Section [7] we present our numerical tests. 

2 Reducing and Modifying the Problem 

Expression fll.ip can be reduced by eliminating the radius R, as the right 
hand side of fll.ip is a quadratic polynomial in R. Setting OF/OR = 0 and 
solving for R yields 

R = f where r* = \/(xj — a)^ -|- {i/i — by (2.1) 

Here we use the “sample mean” notation r = - Similar notation is 

used below for a; = ^ ^ Ym=i ^tc. Now fll.ip reads 

n 

F{a,b) = 

i=l 

n 

= + Vi) + n{a^ -I- 6^ — r^) — 2anx — 2bny (2.2) 

i=l 

Since the term + Vi) is constant, we can drop it, then divide F by 

n, and proceed to minimize the “reduced” objective function 

F{a, b) = + b‘^ — r‘^ — 2ax — 2by (2.3) 

Centering the data prior to the £t is known to help reduce round-off errors 
[3, 09]. This is done by translation x' = Xi — x and y'i = yi — y- It also 
helps to scale the data to make their values of order one. This is done by 
x'l = x'JS and y” = y[/S where S = [x'x’ + y'y'^!'^. 

After one estimates the circle center (a, b) using the centered and scaled 
data points, one needs to rescale and retranslate it by a i—)■ 5*0 -|- T and 
b Sb + y and then compute R by (12. ip . 
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F 


minimum of F 



We will assume that the data is already centered and scaled. In particular, 
this makes x = y = hence fl2.3p further reduces to 

h) = a^ + h'^ - (2.4) 

3 Using a Gradient Based Acceptance Rule 

The objective function is smooth, hence its gradient vanishes at any local 
minimum pmin (we use notation p = (a, 6)), thus 

.F(p^i„ + h)-^(p^in) = 0(||hf) 

This means that if ||h|| < then round-off errors make it impossible to 
reliably compare the values of -T(pmin + h) and J^(pmin) (see Fig. [1]). Thus, 
standard minimization schemes with adaptive step (GN, LM, or Trust Region 
[1]), which accept the next iteration if J^(pnext) < -^(Pcurrent), are doomed to 
stall in the 0 ( 6 ^ 2 ^ neighborhood of Pmin- We use this acceptance rule (we 
call it ARl) only outside that neighborhood. 

Inside the 0(£^/^) neighborhood of Pmin we use the norm of the gradient 
VJ^; i.e., we accept the next iteration provided || VJ^(pnext)|| < l|V.F(p 

current) || 
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(we call this rule AR2). Since is twice differentiable, we have 
V^(p„,in) = 0 and ||V^(p^i„ + h)|| =0(||h||) 


thus the numerically computed value of ||V-F(p)|| remains significantly dif¬ 
ferent from zero all the way down to ||h|| = 0(s); see Fig. |2l 



Figure 2: has linear behavior in the vicinity of the minimum 


More precisely, we abandon ARl and start using AR2 once 


||VUI<£. (e. 


(3.1) 


where e* = 3 x 10 ^ in double precision and 10 ^ in single precision |15j . 


4 Full Newton Minimization 


Fast (quadratic) convergence to a minimum of T in its vicinity re¬ 

quires the use of the full Newton method, i.e., exact formulas for the gradient 
and Hessian matrix of . By standard formulas [1], the gradient is 




a + ur 
b + vf 


(4.1) 
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and the Hessian matrix 



1 — u^ — rvv/r —uv + ruv/r 
—uv + ruv/r l—lf — ruu/r 


(4.2) 


where we nsed onr “sample mean” notation and denoted 


dvi Xi — a 



drj _ Hi-b 
dh Ti 


In particnlar, uv/r = ^ X]^tc. 

Onr adaptive step rnle is based on the Levenberg-Marqnardt type correc¬ 
tion to the Hessian matrix: 


nx = n + \i 

where A > 0 is a control parameter and I is the 2x2 identity matrix. The 
next iteration step is then compnted by 

h = ■ (VF) (4.3) 

For better accnracy, we nse an eigenvalne decomposition of "H 

n = QDQ^ (4.4) 

where Q is an orthogonal matrix and D = diag{(ii, 62 } is a diagonal matrix 
containing the eigenvalnes of l-i. Since "H is a 2 x 2 matrix, its eigendecompo- 
sition can be compnted by fast direct formnlas (withont nsing matrix algebra 
functions). Now 

•Ha = QDaQ'^ (4.5) 

where 

Da = diag{di + \,d 2 + A} (4.6) 

and (14.31) takes form 

h=-QD^iQ^VH (4.7) 

where = diag{l/((ii -|- A), 1/(^2 + A)}. 

We choose A so that (i) the matrix Ha is positive dehnite and, moreover, 
(ii) the step h is not too large. The latter means 

||h|| < hmax = OiIIpII + ao (4.8) 
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where Oq, «! < 1 are constants whose values can be selected empirically. 
In terms of fl4.7p . condition fl4.8p reads 


Dj 'gll < ftmm, g = Q’"VT' 


(4.9) 


For simplicity we replace the 2-norm with the maximum norm and get 


gl 


g2 

di -|- A 

'^max dllQ 

(^2 -l- A 


< h 


—1 '^max 


which gives a lower bound on A: 

A > Amin = max 


|gi| 

hma;) 


- di, 


|g2| 


-d. 


(4.10) 


(4.11) 


This not only enforces fl4.8p . but also guarantees that T-Lx is positive definite. 
Rather than imposing fl4.1ip . we adjust A as usual: if the iteration is accepted, 
A decreases by a certain factor (~ 0.1), otherwise it increases by a certain 
factor (~ 10). After the AR2 rule is adopted (i.e., once we get ||VJ^|| < e*), 
we actually set A = 0 after every successful iteration, so that the convergence 
would be truly quadratic. 

To summarize, here is the scheme of one iteration of our algorithm: 


One iteration of our algorithm 

• Given p = (a, 6), compute d^(p), V-F(p), and 'H(p) 

• if IIV-FII < e* then switch from ARl rule to AR2 rule 

• Compute the eigendecomposition (14.4p of the matrix 'H(p) 

• Compute the vector g = (gi, g 2 ) by fl4.9p 

• Compute hmax by (|M]) and Amin by (14.111) 

• If AR2 rule applies, set A = 0 

• while 

O If A < Amin) set A Amin 
Compute the step h by (14.7p 


o 




















o if ||h|| <£||p|| then terminate iterations and EXIT 
o Compute p' = p + h and -T(pO with VJ^(p') 
o if ARl applies: 

* if -T(p') < *T(p) then set Pnext = P^ reduce A, break 

* else (reject p') increase A, continue 
o if AR2 applies: 

* if ||V^(p')|| < ||V-T(p)|| 

then set Pnext = P^ reduce A, break 

* else (reject p') increase A, continue 

• end while 


5 Big Circle Formulas 

If our fitting circle is large and still passes near the data points, then either 
a or b must be large. In that case we use modified formulas for and its 
derivatives to avoid catastrophic cancelations and minimize round off errors. 
We use polar coordinates + b‘^ and 9 G [0, 27r) so that a = D cos 9 

and b = Dsm9. We denote 6 = 1/D and Zi = xj + yf. Now 

Ti = v^(a - XiY + {b- ViY = Dwi 

where Wi = \/l — 26pi + 6 '^Zi and pi = XiCos9 + yism9. We also have 
Ti = D + ji where 


7i = D{wi - 1) 

Further modihcation gives 


Tj 

1 + Wi' 


Ti = 2pi - 5zi 


li = -Vi + 


_ + Vili 

2 + (57i 


Averaging over i = 1,..., u gives p = 0 (because the data is centered so that 
X = y = D) and r = D + b'g, thus fl2.4p becomes 


D=-2g-5Y 


(5.1) 
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10° 1o' 10^ 10® 1o‘ 10° 10° 1o' 

The distance between the origin and the circie's center 


Figure 3: The number of accurate digits in the value of T computed by the 
standard rule fl2.4p and by the modihed rule fl5.ll) . The picture is almost 
identical for the gradient and Hessian. 


We found that fl5.ip gives a more accurate value of than fl2.4p even for 
relatively small circles, such as 1 < H < 10. For larger circles, D > 10, fl5.ip 
remains numerically stable while fl2.4p breaks down (see Figure [3]). 

Similar modihcations can be made for the formulas (14. ip and (14.21) . We 
omit algebraic details and give the hnal results. Denote 


ai = {xi + '^iCos9)/wi, I3i = {yi + 'fi sin 9)/wi 


Vi = 


Ki = 


li 


l + 2 + ^7^ 

Then using our sample mean notation we dehne 

P = \ (r77 - S T'yyK,), Q = ^{TK + z) 


and 

X = Wffj, Y = yyrj 
Now the gradient is given by 




Acos9-BX 

Asm9-BY 


(5.2) 
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where 


A = P + S\P + Q)Q, 


B = 1 + 6^Q 


The Hessian is 





U{2c - 6^U) - Qs2 -BN Us + Vc- 6^UV + Qsc + BL ' 
Us + Vc- 6^UV + Qsc + BL V{2s - 6^V) - Qc^ - BM 

(5.3) 


where we use shorthand notation c = cos 6 and s = sin 9 and denote 
U = {P + Q)c-X, V = {P + Q)s-Y 


and 


M = {'y'yrj — Q)c‘^ + 2(0777 — U)c + aar] 

N = (777 - Q)s^ + 2 (/ 57?7 - V)s + (3(3r] 

L = (777 — Q)cs + {Wffj — U)s + il3'yri — V)c + ajSr] 

The above formulas look more complicated than fl4.ip - fl4.2l) . Indeed a careful 
flop count shows that they require 55n+const flops, while fl4.ip - fl4.2p require 
19n+const flops, so the computational time roughly triples. 


6 Preventing divergence 

It is shown in |1] that the graph of the objective function P{a,b) has two 
valleys stretching toward inhnity in opposite directions. In one valley P{a, h) 
decreases toward its global minimum, so all the standard algorithms tend 
to move toward the minimum and converge. The other valley is separated 
from the minimum by a ridge, and in that valley P{a,b) decreases as the 
point (a, b) moves along the valley bottom toward inhnity. Thus standard 
algorithms tend to diverge. We refer to [H Sec. 3.7] for a detailed account. 

To prevent divergence, our program detects whether the current iteration 
falls in the “wrong” valley, in which case the algorithm restarts from a point 
in the “right” valley. 

The valleys stretch along the eigenvector corresponding to the smaller 
eigenvalue of the scatter matrix. The latter is given by 


S 


XX xy 
xy yy 


( 6 . 1 ) 
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(remember we assumed x = y = Q). Now suppose the coordinate system is 
rotated so that the x direction is aligned with the major eigenvector of S and 
the y direction with its minor eigenvector. This gives xy = Q and Tx > yy. 
Now the valleys stretch in the y (vertical) direction. 

Under these conditions it was proven in [H Sec. 3.9] that the location of 
the “wrong” valley is determined by the sign of xxy = ^ X] follows: 

(a) if xxy > 0, then the wrong valley lies below the x axis; 

(b) if Txy < 0, then the wrong valley lies above the x axis. 

Our algorithm includes the following block preventing divergence in the 
wrong valley (here L > 0 is a large constant; we use L = 100): 

Divergence prevention 

• if |a| > L or \b\ > L then 

• if the above condition occurs for the hrst time then 

o Compute the components of the scatter matrix S 
o Compute an eigendecomposition of S 

o Rotate the coordinate system aligning the x direction with the 
major eigenvector of S 

o Compute xxy and record its sign, Z =sign(xxy) 

• Compute the center coordinates (a, b) in the rotated system 

• if Zb < 0 (i.e., if the center is in the “wrong” valley) then 

• Restart the algorithm from a point in the “right” valley 

As a restarting point we choose (0, ZL) and rotate it back to the original 
coordinate system. 

Lastly, the best htting circle may not exist at all [U HSl 122] • In that 
case the data points are best htted by a line (more precisely, by the line 
passing through the origin and spanned by the major eigenvector of S). It 
was proven in Sec. 3.9] that this event can only occur if xxy = 0. Thus our 
algorithm abandons the search for the best circle whenever max{|a|, |6|} > L 
and \xxy\ = 0(e). In that case it returns the best htting line. 


12 









7 Numerical Results 


We have compared the proposed algorithm with a few others in a series of 
computer tests. As competitors, we chose the Gauss-Newton (GN) method 
described in 0 (the code was downloaded from the author’s web page), the 
Levenberg-Marquardt (LM) method (see [H Sec. 4.7]), and the Ghernov- 
Lesort (GL) £t based on algebraic circle parameters (see [6]). 

All these fits use the standard stopping rule: they terminate iterations 
whenever the step gets smaller than thus they only achieve suboptimal 
accuracy The GN method is actually able to achieve the optimal 

accuracy 0{e) if it continues iterations until the step gets smaller than e. 
We included this modified version (GNm) in our tests. 

In our tests we have generated samples of n = 8 points randomly, with 
a uniform distribution in the square [—1,1] x [—1,1]. Each sample was then 
centered and scaled as described in Section [2l Uniform distribution produced 
totally “chaotic” samples without any predehned pattern. It is, in a sense, 
the worst case scenario, corresponding to very large noise in data. Thus our 
algorithms were tested under the most challenging conditions. 

We initialized our htting algorithms in two different ways. First, we 
applied a non-iterative algebraic circle £t, such as Kasa £t [n]. Second, we 
chose the center (a, b) randomly in the square [—1,1] x [—1,1] and computed 
the radius R by fl2.ip . 

Table [T] shows how frequently each algorithm diverges and how many 
iterations it takes to converge (whenever they do converge). The GN and 
LM mostly diverge when the iteration gets into the “wrong valley”. The GL 
£t is designed to converge all the time, but due to its suboptimal accuracy it 
occasionally stalls. Our results are consistent with the ones reported in [B]. 


Initial guess 



Algebraic 
Kasa fit 

Randomly chosen 
in [—5, 5] X [—5, 5] 

Average # of 
iterations 

New 

0% 

0% 

8.1 

GNm 

0.07% 

75% 

29.7 

GN 

0.07% 

75% 

11.4 

LM 

0.07% 

23.5% 

11.8 

GL 

0% 

0.02% 

11.7 


Table 1: Percentage of divergencies and the average number of iterations 
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1 to 3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

New 

0 

0 

0 

0 

0 

0 

0 

0 

2 

2 

11 

31 

GNm 

0 

2 

1 

0 

2 

6 

2 

3 

2 

8 

13 

69 

GN 

0 

0 

59 

2459 

6215 

1210 

55 

2 

0 

0 

0 

0 

LM 

0 

2 

174 

8427 

1390 

7 

0 

0 

0 

0 

0 

0 

CL 

1 

1 

96 

3270 

5650 

953 

28 

1 

0 

0 

0 

0 


Table 2: The distribution of “bad” cases over k = 1,..., 14. The total 
number of runs is 10^. Superaccurate results {k > 15) are not shown. 


Next we tested the accuracy of each algorithm. For each generated sample 
we hrst htted a very precise circle by using the built-in function vpa (available 
in MATLAB Symbolic Toolbox) that is able to keep track of 32 (and more) 
accurate decimal digits. We regard it as “ideal” circle. Now each algorithm 
produced its own htting circle and we computed the relative error E in the 
estimated circle parameters, versus the “ideal” circle. Then we found k = 
[— log^Q E ], which is the number of zeros after the decimal point in the digital 
representation of E. This can be roughly interpreted as the number of correct 
(trustworthy) decimal digits in the parameters of the htted circle. Whenever 
k > 15, the approximation reaches the desired “superaccuracy”. The cases 
fc < 14 are regarded as “bad enough” to be of interest to us and are recorded. 

Table [2] shows the number of recorded “bad” cases for each k = 1, 2,..., 14 
for each algorithm, after 10^ runs. The table does not include the super 
accurate results {k > 15). All the algorithms here were initialized by the 
Kasa algebraic circle ht. We only included runs where all the algorithms 
converged to the same minimum of the objective function E (in all these 
cases E < 10“^). 

The table shows that the methods GN, LM, CL with a suboptimal stop¬ 
ping rule (terminating iterations once the step gets smaller than e^/^) give 
suboptimal accuracy with 5 < fc < 9 in most cases. The modihed GN and 
our new method use the optimal stopping rule and achieve better accuracy 
A; > 14 in most cases. But the modihed GN falters occasionally having rare 
bad cases down to /c = 4. And it takes almost 30 iterations, on average to 
reach the desired accuracy, while our method takes only 8 iterations. 

Our numerical results demonstrate the superiority of our new algorithm 
over the main existing algorithms. 
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